Hands-on_Ex07

Author

Gao Ya

Modified

March 8, 2024

1. Choropleth Mapping with R

1.1 Overview

Choropleth mapping involves the symbolisation of enumeration units, such as countries, provinces, states, counties or census units, using area patterns or graduated colors. For example, a social scientist may need to use a choropleth map to portray the spatial distribution of aged population of Singapore by Master Plan 2014 Subzone Boundary.

1.2 Load Packages

pacman::p_load(sf, tmap, tidyverse)

1.3 Importing Geospatial Data into R

The code chunk below uses the st_read() function of sf package to import MP14_SUBZONE_WEB_PL shapefile into R as a simple feature data frame called mpsz.

mpsz <- st_read(dsn = "data/geospatial", 
                layer = "MP14_SUBZONE_WEB_PL")
Reading layer `MP14_SUBZONE_WEB_PL' from data source 
  `C:\GaoYa98\ISSS608-VAA\Hands-on_Ex\Hands-on_Ex07\data\geospatial' 
  using driver `ESRI Shapefile'
Simple feature collection with 323 features and 15 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: 2667.538 ymin: 15748.72 xmax: 56396.44 ymax: 50256.33
Projected CRS: SVY21

1.4 Importing Attribute Data into R

Next, we will import respopagsex2011to2020.csv file into RStudio and save the file into an R dataframe called popagsex.

The task will be performed by using read_csv() function of readr package as shown in the code chunk below.

popdata <- read_csv("data/aspatial/respopagesextod2011to2020.csv")
Rows: 984656 Columns: 7
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (5): PA, SZ, AG, Sex, TOD
dbl (2): Pop, Time

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

1.5 Data Preparation

1.5.1 Data wrangling

The following data wrangling and transformation functions will be used:

pivot_wider() of tidyr package, and mutate(), filter(), group_by() and select() of dplyr package

Show the code
popdata2020 <- popdata %>%
  filter(Time == 2020) %>%
  group_by(PA, SZ, AG) %>%
  summarise(POP = sum(Pop), .groups = "drop") %>%
  ungroup() %>%
  pivot_wider(names_from = AG, values_from = POP, names_prefix = "") %>%
  mutate(YOUNG = rowSums(.[3:6]) + rowSums(.[12]),
         `ECONOMY ACTIVE` = rowSums(.[7:11]) + rowSums(.[13:15]),
         AGED = rowSums(.[16:21]),
         TOTAL = rowSums(.[3:21]),
         DEPENDENCY = (YOUNG + AGED) / `ECONOMY ACTIVE`)

1.5.2 Joining the attribute data and geospatial data

Before we can perform the georelational join, one extra step is required to convert the values in PA and SZ fields to uppercase. This is because the values of PA and SZ fields are made up of upper- and lowercase. On the other, hand the SUBZONE_N and PLN_AREA_N are in uppercase.

Show the code
popdata2020 <- popdata2020 %>%
  mutate_at(.vars = vars(PA, SZ), 
          .funs = funs(toupper)) %>%
  filter(`ECONOMY ACTIVE` > 0)
Warning: `funs()` was deprecated in dplyr 0.8.0.
ℹ Please use a list of either functions or lambdas:

# Simple named list: list(mean = mean, median = median)

# Auto named with `tibble::lst()`: tibble::lst(mean, median)

# Using lambdas list(~ mean(., trim = .2), ~ median(., na.rm = TRUE))

Next, left_join() of dplyr is used to join the geographical data and attribute table using planning subzone name e.g. SUBZONE_N and SZ as the common identifier.

Show the code
mpsz_pop2020 <- left_join(mpsz, popdata2020,
                          by = c("SUBZONE_N" = "SZ"))
write_rds(mpsz_pop2020, "data/rds/mpszpop2020.rds")

1.6 Choropleth Mapping Geospatial Data Using tmap

Two approaches can be used to prepare thematic map using tmap, they are:

Plotting a thematic map quickly by using qtm(). Plotting highly customisable thematic map by using tmap elements.

1.6.1 Plotting a choropleth map quickly by using qtm()

The easiest and quickest to draw a choropleth map using tmap is using qtm(). It is concise and provides a good default visualisation in many cases.

The code chunk below will draw a cartographic standard choropleth map as shown below.

Show the code
library(tmap)
tmap_mode("plot")
tmap mode set to plotting
Show the code
qtm(mpsz_pop2020, 
    fill = "DEPENDENCY")

1.6.2 Creating a choropleth map by using tmap’s elements

Despite its usefulness of drawing a choropleth map quickly and easily, the disadvantge of qtm() is that it makes aesthetics of individual layers harder to control. To draw a high quality cartographic choropleth map as shown in the figure below, tmap’s drawing elements should be used.

Show the code
tm_shape(mpsz_pop2020)+
  tm_fill("DEPENDENCY", 
          style = "quantile", 
          palette = "Blues",
          title = "Dependency ratio") +
  tm_layout(main.title = "Distribution of Dependency Ratio by planning subzone",
            main.title.position = "center",
            main.title.size = 1.2,
            legend.height = 0.45, 
            legend.width = 0.35,
            frame = TRUE) +
  tm_borders(alpha = 0.5) +
  tm_compass(type="8star", size = 2) +
  tm_scale_bar() +
  tm_grid(alpha =0.2) +
  tm_credits("Source: Planning Sub-zone boundary from Urban Redevelopment Authorithy (URA)\n and Population data from Department of Statistics DOS", 
             position = c("left", "bottom"))

1.6.3 Data classification methods of tmap

Most choropleth maps employ some methods of data classification. The point of classification is to take a large number of observations and group them into data ranges or classes.

tmap provides a total ten data classification methods, namely: fixed, sd, equal, pretty (default), quantile, kmeans, hclust, bclust, fisher, and jenks.

To define a data classification method, the style argument of tm_fill() or tm_polygons() will be used.

1.6.3.1 Plotting choropleth maps with built-in classification methods

Show the code
tm_shape(mpsz_pop2020)+
  tm_fill("DEPENDENCY",
          n = 5,
          style = "jenks") +
  tm_borders(alpha = 0.5)

1.6.3.2 Plotting choropleth map with custome break

Show the code
tm_shape(mpsz_pop2020)+
  tm_fill("DEPENDENCY",
          breaks = c(0, 0.60, 0.70, 0.80, 0.90, 1.00)) +
  tm_borders(alpha = 0.5)
Warning: Values have found that are higher than the highest break

1.6.4 Colour Scheme

tmap supports colour ramps either defined by the user or a set of predefined colour ramps from the RColorBrewer package.

Show the code
tm_shape(mpsz_pop2020)+
  tm_fill("DEPENDENCY",
          style = "quantile",
          palette = "-Greens") +
  tm_borders(alpha = 0.5)

1.6.5 Map Layouts

tmap also also provides arguments to draw other map furniture such as compass, scale bar and grid lines.

In the code chunk below, tm_compass(), tm_scale_bar() and tm_grid() are used to add compass, scale bar and grid lines onto the choropleth map.

Show the code
tm_shape(mpsz_pop2020)+
  tm_fill("DEPENDENCY", 
          style = "quantile", 
          palette = "Blues",
          title = "No. of persons") +
  tm_layout(main.title = "Distribution of Dependency Ratio \nby planning subzone",
            main.title.position = "center",
            main.title.size = 1.2,
            legend.height = 0.45, 
            legend.width = 0.35,
            frame = TRUE) +
  tm_borders(alpha = 0.5) +
  tm_compass(type="8star", size = 2) +
  tm_scale_bar(width = 0.15) +
  tm_grid(lwd = 0.1, alpha = 0.2) +
  tm_credits("Source: Planning Sub-zone boundary from Urban Redevelopment Authorithy (URA)\n and Population data from Department of Statistics DOS", 
             position = c("left", "bottom"))

1.6.6 Drawing Small Multiple Choropleth Maps

Small multiple maps, also referred to as facet maps, are composed of many maps arrange side-by-side, and sometimes stacked vertically. Small multiple maps enable the visualisation of how spatial relationships change with respect to another variable, such as time.

In tmap, small multiple maps can be plotted in three ways:

  • by assigning multiple values to at least one of the asthetic arguments,
  • by defining a group-by variable in tm_facets(), and
  • by creating multiple stand-alone maps with tmap_arrange().
Show the code
tm_shape(mpsz_pop2020) +
  tm_fill("DEPENDENCY",
          style = "quantile",
          palette = "Blues",
          thres.poly = 0) + 
  tm_facets(by="REGION_N", 
            free.coords=TRUE, 
            drop.shapes=FALSE) +
  tm_layout(legend.show = FALSE,
            title.position = c("center", "center"), 
            title.size = 20) +
  tm_borders(alpha = 0.5)
Warning: The argument drop.shapes has been renamed to drop.units, and is
therefore deprecated

2. Visualising Geospatial Point Data

2.1 Overview

Proportional symbol maps (also known as graduate symbol maps) are a class of maps that use the visual variable of size to represent differences in the magnitude of a discrete, abruptly changing phenomenon, e.g. counts of people. Like choropleth maps, you can create classed or unclassed versions of these maps. The classed ones are known as range-graded or graduated symbols, and the unclassed are called proportional symbols, where the area of the symbols are proportional to the values of the attribute being mapped. In this hands-on exercise, you will learn how to create a proportional symbol map showing the number of wins by Singapore Pools’ outlets using an R package called tmap.

2.2 Load Packages

Before we get started, we need to ensure that tmap package of R and other related R packages have been installed and loaded into R.

pacman::p_load(sf, tmap, tidyverse)

2.3 Data Import and Preparation

The code chunk below uses read_csv() function of readr package to import SGPools_svy21.csv into R as a tibble data frame called sgpools.

sgpools <- read_csv("data/aspatial/SGPools_svy21.csv")
Rows: 306 Columns: 7
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (3): NAME, ADDRESS, OUTLET TYPE
dbl (4): POSTCODE, XCOORD, YCOORD, Gp1Gp2 Winnings

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

The code chunk below converts sgpools data frame into a simple feature data frame by using st_as_sf() of sf packages

sgpools_sf <- st_as_sf(sgpools, 
                       coords = c("XCOORD", "YCOORD"),
                       crs= 3414)

##2.4 Drawing Proportional Symbol Map To create an interactive proportional symbol map in R, the view mode of tmap will be used.

The code churn below will turn on the interactive mode of tmap.

tmap_mode("view")
tmap mode set to interactive viewing

2.4.1 Interactive point symbol map

The code chunks below are used to create an interactive point symbol map.

Show the code
tm_shape(sgpools_sf)+
tm_bubbles(col = "red",
           size = 1,
           border.col = "black",
           border.lwd = 1)

2.4.2 make it proportional

To draw a proportional symbol map, we need to assign a numerical variable to the size visual attribute. The code chunks below show that the variable Gp1Gp2Winnings is assigned to size visual attribute.

Show the code
tm_shape(sgpools_sf)+
tm_bubbles(col = "red",
           size = "Gp1Gp2 Winnings",
           border.col = "black",
           border.lwd = 1)
Legend for symbol sizes not available in view mode.

2.4.3 Lets give it a different colour

The proportional symbol map can be further improved by using the colour visual attribute. In the code chunks below, OUTLET_TYPE variable is used as the colour attribute variable.

Show the code
tm_shape(sgpools_sf)+
tm_bubbles(col = "OUTLET TYPE", 
          size = "Gp1Gp2 Winnings",
          border.col = "black",
          border.lwd = 1)
Legend for symbol sizes not available in view mode.

3. Analytical Mapping

This in-class exercise helps to gain hands-on experience on using appropriate R methods to plot analytical maps. ## 3.1 Load Packages

pacman::p_load(tmap, tidyverse, sf)

3.2 Importing data

For the purpose of this hands-on exercise, a prepared data set called NGA_wp.rds will be used. The data set is a polygon feature data.frame providing information on water point of Nigeria at the LGA level. You can find the data set in the rds sub-direct of the hands-on data folder.

NGA_wp <- read_rds("data/rds/NGA_wp.rds")

3.3 Basic Choropleth Mapping

Visualising distribution of non-functional water point

Show the code
p1 <- tm_shape(NGA_wp) +
  tm_fill("wp_functional",
          n = 10,
          style = "equal",
          palette = "Blues") +
  tm_borders(lwd = 0.1,
             alpha = 1) +
  tm_layout(main.title = "Distribution of functional water point by LGAs",
            legend.outside = FALSE)

p2 <- tm_shape(NGA_wp) +
  tm_fill("total_wp",
          n = 10,
          style = "equal",
          palette = "Blues") +
  tm_borders(lwd = 0.1,
             alpha = 1) +
  tm_layout(main.title = "Distribution of total  water point by LGAs",
            legend.outside = FALSE)

tmap_arrange(p2, p1, nrow = 1)

3.4 Choropleth Map for Rates

In much of our readings we have now seen the importance to map rates rather than counts of things, and that is for the simple reason that water points are not equally distributed in space. That means that if we do not account for how many water points are somewhere, we end up mapping total water point size rather than our topic of interest.

3.4.1 Deriving Proportion of Functional Water Points and Non-Functional Water Points

We will tabulate the proportion of functional water points and the proportion of non-functional water points in each LGA. In the following code chunk, mutate() from dplyr package is used to derive two fields, namely pct_functional and pct_nonfunctional.

Show the code
NGA_wp <- NGA_wp %>%
  mutate(pct_functional = wp_functional/total_wp) %>%
  mutate(pct_nonfunctional = wp_nonfunctional/total_wp)

3.4.2 Plotting map of rate

Plot a choropleth map showing the distribution of percentage functional water point by LGA

Show the code
tm_shape(NGA_wp) +
  tm_fill("pct_functional",
          n = 10,
          style = "equal",
          palette = "Blues",
          legend.hist = TRUE) +
  tm_borders(lwd = 0.1,
             alpha = 1) +
  tm_layout(main.title = "Rate map of functional water point by LGAs",
            legend.outside = TRUE)

3.5 Extreme Value Maps

Extreme value maps are variations of common choropleth maps where the classification is designed to highlight extreme values at the lower and upper end of the scale, with the goal of identifying outliers. These maps were developed in the spirit of spatializing EDA, i.e., adding spatial features to commonly used approaches in non-spatial EDA (Anselin 1994).

3.5.1 Percentile Map

The percentile map is a special type of quantile map with six specific categories: 0-1%,1-10%, 10-50%,50-90%,90-99%, and 99-100%. The corresponding breakpoints can be derived by means of the base R quantile command, passing an explicit vector of cumulative probabilities as c(0,.01,.1,.5,.9,.99,1). Note that the begin and endpoint need to be included.

3.5.1.1 Data Preparation

Show the code
NGA_wp <- NGA_wp %>%
  drop_na()

percent <- c(0,.01,.1,.5,.9,.99,1)
var <- NGA_wp["pct_functional"] %>%
  st_set_geometry(NULL)
quantile(var[,1], percent)
       0%        1%       10%       50%       90%       99%      100% 
0.0000000 0.0000000 0.2169811 0.4791667 0.8611111 1.0000000 1.0000000 

3.5.1.2 Creating the get.var function

Firstly, we will write an R function as shown below to extract a variable (i.e. wp_nonfunctional) as a vector out of an sf data.frame.

arguments:

  • vname: variable name (as character, in quotes)

  • df: name of sf data frame

returns:

  • v: vector with values (without a column name)
Show the code
get.var <- function(vname,df) {
  v <- df[vname] %>% 
    st_set_geometry(NULL)
  v <- unname(v[,1])
  return(v)
}

3.5.1.3 A percentile mapping function

Next, we will write a percentile mapping function by using the code chunk below

Show the code
percentmap <- function(vnam, df, legtitle=NA, mtitle="Percentile Map"){
  percent <- c(0,.01,.1,.5,.9,.99,1)
  var <- get.var(vnam, df)
  bperc <- quantile(var, percent)
  tm_shape(df) +
  tm_polygons() +
  tm_shape(df) +
     tm_fill(vnam,
             title=legtitle,
             breaks=bperc,
             palette="Blues",
          labels=c("< 1%", "1% - 10%", "10% - 50%", "50% - 90%", "90% - 99%", "> 99%"))  +
  tm_borders() +
  tm_layout(main.title = mtitle, 
            title.position = c("right","bottom"))
}

3.5.1.4 Test drive the percentile mapping function

To run the function, type the code chunk as shown below.

percentmap("total_wp", NGA_wp)

3.5.1 Box map

In essence, a box map is an augmented quartile map, with an additional lower and upper category. When there are lower outliers, then the starting point for the breaks is the minimum value, and the second break is the lower fence. In contrast, when there are no lower outliers, then the starting point for the breaks will be the lower fence, and the second break is the minimum value (there will be no observations that fall in the interval between the lower fence and the minimum value).

Show the code
ggplot(data = NGA_wp,
       aes(x = "",
           y = wp_nonfunctional)) +
  geom_boxplot()

Displaying summary statistics on a choropleth map by using the basic principles of boxplot.

To create a box map, a custom breaks specification will be used. However, there is a complication. The break points for the box map vary depending on whether lower or upper outliers are present.

3.5.2.1 Creating the boxbreaks function

The code chunk below is an R function that creating break points for a box map.

arguments: - v: vector with observations - mult: multiplier for IQR (default 1.5) returns: - bb: vector with 7 break points compute quartile and fences

Show the code
boxbreaks <- function(v,mult=1.5) {
  qv <- unname(quantile(v))
  iqr <- qv[4] - qv[2]
  upfence <- qv[4] + mult * iqr
  lofence <- qv[2] - mult * iqr
  # initialize break points vector
  bb <- vector(mode="numeric",length=7)
  # logic for lower and upper fences
  if (lofence < qv[1]) {  # no lower outliers
    bb[1] <- lofence
    bb[2] <- floor(qv[1])
  } else {
    bb[2] <- lofence
    bb[1] <- qv[1]
  }
  if (upfence > qv[5]) { # no upper outliers
    bb[7] <- upfence
    bb[6] <- ceiling(qv[5])
  } else {
    bb[6] <- upfence
    bb[7] <- qv[5]
  }
  bb[3:5] <- qv[2:4]
  return(bb)
}

3.5.2 2 Creating the get.var function

The code chunk below is an R function to extract a variable as a vector out of an sf data frame. arguments: - vname: variable name (as character, in quotes) - df: name of sf data frame returns: - v: vector with values (without a column name)

Show the code
get.var <- function(vname,df) {
  v <- df[vname] %>% st_set_geometry(NULL)
  v <- unname(v[,1])
  return(v)
}

3.5.2.3 Boxmap function

Show the code
var <- get.var("wp_nonfunctional", NGA_wp) 
boxbreaks(var)
[1] -56.5   0.0  14.0  34.0  61.0 131.5 278.0
Show the code
boxmap <- function(vnam, df, 
                   legtitle=NA,
                   mtitle="Box Map",
                   mult=1.5){
  var <- get.var(vnam,df)
  bb <- boxbreaks(var)
  tm_shape(df) +
    tm_polygons() +
  tm_shape(df) +
     tm_fill(vnam,title=legtitle,
             breaks=bb,
             palette="Blues",
          labels = c("lower outlier", 
                     "< 25%", 
                     "25% - 50%", 
                     "50% - 75%",
                     "> 75%", 
                     "upper outlier"))  +
  tm_borders() +
  tm_layout(main.title = mtitle, 
            title.position = c("left",
                               "top"))
}


tmap_mode("plot")
tmap mode set to plotting
Show the code
boxmap("wp_nonfunctional", NGA_wp)
Warning: Breaks contains positive and negative values. Better is to use
diverging scale instead, or set auto.palette.mapping to FALSE.